Inferring an Original Sequence from Erroneous Copies : A Bayesian Approach
نویسندگان
چکیده
This paper considers the problem of inferring an original sequence from a number of erroneous copies. The problem arises in DNA sequencing, particularly in the context of emerging technologies that provide high throughput or other advantages, but at the cost of introducing many errors. We develop a Bayesian probabilistic model of the introduction of errors, and search for a sequence that has maximum posterior probability with respect to the model. We present results of extensive tests in which error-prone sequencing of real DNA was simulated. The results obtained using the new approach are compared to results obtained by deriving a consensus sequence from a multiple sequence alignment. We find that a significant improvement in accuracy is obtained using the new approach. The implication is that high error levels need not be a barrier to the adoption of sequencing technologies that are in other respects promising, because most errors can be detected and corrected using a small number of reads.
منابع مشابه
Inferring a DNA Sequence from Erroneous Copies
We suggest a novel approach for eeciently reconstructing an original DNA sequence from erroneous copies.
متن کاملBayesian approach to inference of population structure
Methods of inferring the population structure, its applications in identifying disease models as well as foresighting the physical and mental situation of human beings have been finding ever-increasing importance. In this article, first, motivation and significance of studying the problem of population structure is explained. In the next section, the applications of inference of p...
متن کاملImproving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach
A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملMolecular Identification of the Persian Gulf Sea Hare (Aplysia sp.) Based on 16s rRNA Gene Sequence
Background: Sea hares of the Aplysia genus are among the mollusks of interest for various researchers to study their phylogeny, bioactive compounds and the nervous system. These mollusks are herbivorous and produce chemical compounds (ink) to defend themselves. The present study provided molecular identification of the Persian Gulf (Bushehr city) sea hare using 16s rRNA gene sequence. Materials...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003